Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 233538 |
| Missing cells | 36127 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 24.9 MiB |
| Average record size in memory | 112.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 3 |
| BOOL | 1 |
| DATE | 1 |
Reproduction
| Analysis started | 2020-05-12 11:19:29.745409 |
|---|---|
| Analysis finished | 2020-05-12 11:19:58.682968 |
| Duration | 28.94 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
VERSIE has constant value "1.0" | Constant |
DATUM_BESTAND has constant value "2020-04-24" | Constant |
PEILDATUM has constant value "2020-04-01" | Constant |
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values | High cardinality |
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPD | High correlation |
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPD | High correlation |
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAG | High correlation |
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAG | High correlation |
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPC | High correlation |
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPC | High correlation |
GEMIDDELDE_VERKOOPPRIJS has 36127 (15.5%) missing values | Missing |
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 20.87549947) | Skewed |
| Distinct count | 1 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| 1 |
|---|
| Value | Count | Frequency (%) | |
| 1 | 233538 | 100.0% |
| Distinct count | 1 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| 2020-04-24 |
|---|
| Value | Count | Frequency (%) | |
| 2020-04-24 | 233538 | 100.0% |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
| Distinct count | 1 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| 2020-04-01 |
|---|
| Value | Count | Frequency (%) | |
| 2020-04-01 | 233538 | 100.0% |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
JAAR
Date
| Distinct count | 9 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| Minimum | 2012-01-01 00:00:00 |
|---|---|
| Maximum | 2020-01-01 00:00:00 |
Histogram
BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)
| Distinct count | 27 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 422.5759405321618 |
|---|---|
| Minimum | 301 |
| Maximum | 8418 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 301 |
|---|---|
| 5-th percentile | 302 |
| Q1 | 305 |
| median | 313 |
| Q3 | 322 |
| 95-th percentile | 361 |
| Maximum | 8418 |
| Range | 8117 |
| Interquartile range (IQR) | 17 |
Descriptive statistics
| Standard deviation | 924.108952 |
|---|---|
| Coefficient of variation (CV) | 2.186847057 |
| Kurtosis | 70.71843912 |
| Mean | 422.5759405 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 8.520739096 |
| Sum | 98687540 |
| Variance | 853977.3551 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 305 | 33114 | 14.2% | |
| 313 | 30183 | 12.9% | |
| 303 | 26827 | 11.5% | |
| 330 | 18786 | 8.0% | |
| 316 | 15990 | 6.8% | |
| 308 | 11816 | 5.1% | |
| 324 | 9746 | 4.2% | |
| 306 | 9649 | 4.1% | |
| 301 | 9407 | 4.0% | |
| 304 | 7565 | 3.2% | |
| Other values (17) | 60455 | 25.9% |
| Value | Count | Frequency (%) | |
| 301 | 9407 | 4.0% | |
| 302 | 5017 | 2.1% | |
| 303 | 26827 | 11.5% | |
| 304 | 7565 | 3.2% | |
| 305 | 33114 | 14.2% |
| Value | Count | Frequency (%) | |
| 8418 | 3072 | 1.3% | |
| 1900 | 153 | 0.1% | |
| 390 | 575 | 0.2% | |
| 389 | 2548 | 1.1% | |
| 362 | 3797 | 1.6% |
| Distinct count | 1766 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 MiB |
| 101 | 980 |
|---|---|
| 402 | 954 |
| 403 | 926 |
| 301 | 926 |
| 203 | 880 |
| Other values (1761) |
| Value | Count | Frequency (%) | |
| 101 | 980 | 0.4% | |
| 402 | 954 | 0.4% | |
| 403 | 926 | 0.4% | |
| 301 | 926 | 0.4% | |
| 203 | 880 | 0.4% | |
| 201 | 876 | 0.4% | |
| 401 | 786 | 0.3% | |
| 404 | 775 | 0.3% | |
| 802 | 767 | 0.3% | |
| 409 | 753 | 0.3% | |
| Other values (1756) | 224915 | 96.3% |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.350152866 |
| Min length | 2 |
ZORGPRODUCT_CD
Real number (ℝ≥0)
| Distinct count | 5880 |
|---|---|
| Unique (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 441766792.3267948 |
|---|---|
| Minimum | 10501002 |
| Maximum | 998418081 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 10501002 |
|---|---|
| 5-th percentile | 28999036 |
| Q1 | 99799050 |
| median | 149599030 |
| Q3 | 990004006 |
| 95-th percentile | 990416029 |
| Maximum | 998418081 |
| Range | 987917079 |
| Interquartile range (IQR) | 890204956 |
Descriptive statistics
| Standard deviation | 429373913.8 |
|---|---|
| Coefficient of variation (CV) | 0.9719470119 |
| Kurtosis | -1.74209471 |
| Mean | 441766792.3 |
| Median Absolute Deviation (MAD) | 119700025 |
| Skewness | 0.4626516201 |
| Sum | 1.031693331e+14 |
| Variance | 1.843619578e+17 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 990004009 | 1687 | 0.7% | |
| 990004007 | 1664 | 0.7% | |
| 990003004 | 1663 | 0.7% | |
| 990004006 | 1345 | 0.6% | |
| 990356076 | 1165 | 0.5% | |
| 990356073 | 1083 | 0.5% | |
| 990003007 | 1066 | 0.5% | |
| 131999228 | 1009 | 0.4% | |
| 131999164 | 1000 | 0.4% | |
| 199299013 | 964 | 0.4% | |
| Other values (5870) | 220892 | 94.6% |
| Value | Count | Frequency (%) | |
| 10501002 | 6 | < 0.1% | |
| 10501003 | 8 | < 0.1% | |
| 10501004 | 8 | < 0.1% | |
| 10501005 | 8 | < 0.1% | |
| 10501007 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 998418081 | 112 | < 0.1% | |
| 998418080 | 97 | < 0.1% | |
| 998418079 | 26 | < 0.1% | |
| 998418077 | 5 | < 0.1% | |
| 998418076 | 5 | < 0.1% |
| Distinct count | 8475 |
|---|---|
| Unique (%) | 3.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 504.21537394342675 |
|---|---|
| Minimum | 1 |
| Maximum | 152465 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 14 |
| Q3 | 102 |
| 95-th percentile | 1706 |
| Maximum | 152465 |
| Range | 152464 |
| Interquartile range (IQR) | 99 |
Descriptive statistics
| Standard deviation | 3091.237089 |
|---|---|
| Coefficient of variation (CV) | 6.130787059 |
| Kurtosis | 374.465047 |
| Mean | 504.2153739 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 16.20594033 |
| Sum | 117753450 |
| Variance | 9555746.743 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 38661 | 16.6% | |
| 2 | 19000 | 8.1% | |
| 3 | 12303 | 5.3% | |
| 4 | 9073 | 3.9% | |
| 5 | 7065 | 3.0% | |
| 6 | 5927 | 2.5% | |
| 7 | 4927 | 2.1% | |
| 8 | 4168 | 1.8% | |
| 9 | 3875 | 1.7% | |
| 10 | 3376 | 1.4% | |
| Other values (8465) | 125163 | 53.6% |
| Value | Count | Frequency (%) | |
| 1 | 38661 | 16.6% | |
| 2 | 19000 | 8.1% | |
| 3 | 12303 | 5.3% | |
| 4 | 9073 | 3.9% | |
| 5 | 7065 | 3.0% |
| Value | Count | Frequency (%) | |
| 152465 | 1 | < 0.1% | |
| 147127 | 1 | < 0.1% | |
| 144491 | 1 | < 0.1% | |
| 108986 | 1 | < 0.1% | |
| 108942 | 1 | < 0.1% |
| Distinct count | 9029 |
|---|---|
| Unique (%) | 3.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 585.8709460558881 |
|---|---|
| Minimum | 1 |
| Maximum | 239632 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 14 |
| Q3 | 111 |
| 95-th percentile | 1928 |
| Maximum | 239632 |
| Range | 239631 |
| Interquartile range (IQR) | 108 |
Descriptive statistics
| Standard deviation | 3882.47443 |
|---|---|
| Coefficient of variation (CV) | 6.626842406 |
| Kurtosis | 704.8303162 |
| Mean | 585.8709461 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 20.87549947 |
| Sum | 136823129 |
| Variance | 15073607.7 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 37293 | 16.0% | |
| 2 | 18653 | 8.0% | |
| 3 | 12189 | 5.2% | |
| 4 | 8933 | 3.8% | |
| 5 | 6998 | 3.0% | |
| 6 | 5897 | 2.5% | |
| 7 | 4943 | 2.1% | |
| 8 | 4145 | 1.8% | |
| 9 | 3801 | 1.6% | |
| 10 | 3401 | 1.5% | |
| Other values (9019) | 127285 | 54.5% |
| Value | Count | Frequency (%) | |
| 1 | 37293 | 16.0% | |
| 2 | 18653 | 8.0% | |
| 3 | 12189 | 5.2% | |
| 4 | 8933 | 3.8% | |
| 5 | 6998 | 3.0% |
| Value | Count | Frequency (%) | |
| 239632 | 1 | < 0.1% | |
| 231931 | 1 | < 0.1% | |
| 229679 | 1 | < 0.1% | |
| 226567 | 1 | < 0.1% | |
| 218433 | 1 | < 0.1% |
| Distinct count | 7328 |
|---|---|
| Unique (%) | 3.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7645.161339910422 |
|---|---|
| Minimum | 1 |
| Maximum | 208424 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 42 |
| Q1 | 416 |
| median | 1725 |
| Q3 | 6467 |
| 95-th percentile | 36535 |
| Maximum | 208424 |
| Range | 208423 |
| Interquartile range (IQR) | 6051 |
Descriptive statistics
| Standard deviation | 17521.54892 |
|---|---|
| Coefficient of variation (CV) | 2.291848155 |
| Kurtosis | 31.50762579 |
| Mean | 7645.16134 |
| Median Absolute Deviation (MAD) | 1562 |
| Skewness | 4.914266766 |
| Sum | 1785435689 |
| Variance | 307004676.4 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 377 | 0.2% | |
| 19 | 371 | 0.2% | |
| 37 | 354 | 0.2% | |
| 20 | 354 | 0.2% | |
| 11 | 340 | 0.1% | |
| 9 | 334 | 0.1% | |
| 12 | 330 | 0.1% | |
| 2 | 327 | 0.1% | |
| 32 | 324 | 0.1% | |
| 21 | 322 | 0.1% | |
| Other values (7318) | 230105 | 98.5% |
| Value | Count | Frequency (%) | |
| 1 | 377 | 0.2% | |
| 2 | 327 | 0.1% | |
| 3 | 269 | 0.1% | |
| 4 | 296 | 0.1% | |
| 5 | 276 | 0.1% |
| Value | Count | Frequency (%) | |
| 208424 | 19 | < 0.1% | |
| 203213 | 25 | < 0.1% | |
| 202452 | 17 | < 0.1% | |
| 200163 | 16 | < 0.1% | |
| 198510 | 17 | < 0.1% |
| Distinct count | 8099 |
|---|---|
| Unique (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10662.092327586945 |
|---|---|
| Minimum | 1 |
| Maximum | 336711 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 52 |
| Q1 | 541 |
| median | 2350 |
| Q3 | 8820 |
| 95-th percentile | 50809 |
| Maximum | 336711 |
| Range | 336710 |
| Interquartile range (IQR) | 8279 |
Descriptive statistics
| Standard deviation | 25178.36648 |
|---|---|
| Coefficient of variation (CV) | 2.361484567 |
| Kurtosis | 35.71828716 |
| Mean | 10662.09233 |
| Median Absolute Deviation (MAD) | 2152 |
| Skewness | 5.189359763 |
| Sum | 2490003718 |
| Variance | 633950138.8 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 332 | 0.1% | |
| 13 | 279 | 0.1% | |
| 38 | 278 | 0.1% | |
| 34 | 271 | 0.1% | |
| 2 | 264 | 0.1% | |
| 93 | 263 | 0.1% | |
| 11 | 259 | 0.1% | |
| 22 | 259 | 0.1% | |
| 46 | 259 | 0.1% | |
| 20 | 257 | 0.1% | |
| Other values (8089) | 230817 | 98.8% |
| Value | Count | Frequency (%) | |
| 1 | 332 | 0.1% | |
| 2 | 264 | 0.1% | |
| 3 | 257 | 0.1% | |
| 4 | 235 | 0.1% | |
| 5 | 218 | 0.1% |
| Value | Count | Frequency (%) | |
| 336711 | 19 | < 0.1% | |
| 326565 | 25 | < 0.1% | |
| 323153 | 20 | < 0.1% | |
| 293735 | 17 | < 0.1% | |
| 289859 | 17 | < 0.1% |
| Distinct count | 234 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 669125.3314878093 |
|---|---|
| Minimum | 4 |
| Maximum | 1489527 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 45746 |
| Q1 | 286060 |
| median | 744725 |
| Q3 | 995553 |
| 95-th percentile | 1334838 |
| Maximum | 1489527 |
| Range | 1489523 |
| Interquartile range (IQR) | 709493 |
Descriptive statistics
| Standard deviation | 410521.7659 |
|---|---|
| Coefficient of variation (CV) | 0.6135199889 |
| Kurtosis | -1.024536983 |
| Mean | 669125.3315 |
| Median Absolute Deviation (MAD) | 301942 |
| Skewness | 0.04354448669 |
| Sum | 1.562661917e+11 |
| Variance | 1.685281203e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 880991 | 5102 | 2.2% | |
| 873252 | 4354 | 1.9% | |
| 843723 | 4348 | 1.9% | |
| 886628 | 4324 | 1.9% | |
| 828119 | 4204 | 1.8% | |
| 1077849 | 3887 | 1.7% | |
| 1064039 | 3851 | 1.6% | |
| 1028641 | 3828 | 1.6% | |
| 1040347 | 3810 | 1.6% | |
| 980824 | 3757 | 1.6% | |
| Other values (224) | 192073 | 82.2% |
| Value | Count | Frequency (%) | |
| 4 | 3 | < 0.1% | |
| 7 | 8 | < 0.1% | |
| 10 | 4 | < 0.1% | |
| 20 | 28 | < 0.1% | |
| 24 | 35 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1489527 | 2976 | 1.3% | |
| 1450619 | 3054 | 1.3% | |
| 1421891 | 3564 | 1.5% | |
| 1334838 | 3540 | 1.5% | |
| 1331322 | 3547 | 1.5% |
| Distinct count | 236 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1048057.1340252978 |
|---|---|
| Minimum | 4 |
| Maximum | 2538563 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 49064 |
| Q1 | 458322 |
| median | 1065599 |
| Q3 | 1719589 |
| 95-th percentile | 2185552 |
| Maximum | 2538563 |
| Range | 2538559 |
| Interquartile range (IQR) | 1261267 |
Descriptive statistics
| Standard deviation | 698522.2421 |
|---|---|
| Coefficient of variation (CV) | 0.6664925217 |
| Kurtosis | -0.8986604022 |
| Mean | 1048057.134 |
| Median Absolute Deviation (MAD) | 628825 |
| Skewness | 0.2923463218 |
| Sum | 2.44761167e+11 |
| Variance | 4.879333227e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1211567 | 5102 | 2.2% | |
| 1279749 | 4354 | 1.9% | |
| 1215804 | 4348 | 1.9% | |
| 1300629 | 4324 | 1.9% | |
| 1208398 | 4204 | 1.8% | |
| 2538563 | 3887 | 1.7% | |
| 2489495 | 3851 | 1.6% | |
| 2435526 | 3828 | 1.6% | |
| 2067486 | 3810 | 1.6% | |
| 2185552 | 3757 | 1.6% | |
| Other values (226) | 192073 | 82.2% |
| Value | Count | Frequency (%) | |
| 4 | 3 | < 0.1% | |
| 8 | 8 | < 0.1% | |
| 10 | 4 | < 0.1% | |
| 21 | 14 | < 0.1% | |
| 24 | 11 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2538563 | 3887 | 1.7% | |
| 2489495 | 3851 | 1.6% | |
| 2435526 | 3828 | 1.6% | |
| 2185552 | 3757 | 1.6% | |
| 2067486 | 3810 | 1.6% |
| Distinct count | 3032 |
|---|---|
| Unique (%) | 1.5% |
| Missing | 36127 |
| Missing (%) | 15.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3496.6998292901612 |
|---|---|
| Minimum | 70.0 |
| Maximum | 287220.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 MiB |
Quantile statistics
| Minimum | 70 |
|---|---|
| 5-th percentile | 140 |
| Q1 | 460 |
| median | 1235 |
| Q3 | 4015 |
| 95-th percentile | 13215 |
| Maximum | 287220 |
| Range | 287150 |
| Interquartile range (IQR) | 3555 |
Descriptive statistics
| Standard deviation | 6643.232732 |
|---|---|
| Coefficient of variation (CV) | 1.899857882 |
| Kurtosis | 180.997742 |
| Mean | 3496.699829 |
| Median Absolute Deviation (MAD) | 1000 |
| Skewness | 8.182229188 |
| Sum | 690287010 |
| Variance | 44132541.13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 160 | 1851 | 0.8% | |
| 105 | 1704 | 0.7% | |
| 110 | 1416 | 0.6% | |
| 180 | 1331 | 0.6% | |
| 300 | 1214 | 0.5% | |
| 140 | 1182 | 0.5% | |
| 145 | 1023 | 0.4% | |
| 295 | 997 | 0.4% | |
| 500 | 992 | 0.4% | |
| 185 | 970 | 0.4% | |
| Other values (3022) | 184731 | 79.1% | |
| (Missing) | 36127 | 15.5% |
| Value | Count | Frequency (%) | |
| 70 | 226 | 0.1% | |
| 75 | 74 | < 0.1% | |
| 80 | 359 | 0.2% | |
| 85 | 828 | 0.4% | |
| 90 | 441 | 0.2% |
| Value | Count | Frequency (%) | |
| 287220 | 8 | < 0.1% | |
| 148910 | 3 | < 0.1% | |
| 142880 | 4 | < 0.1% | |
| 122155 | 4 | < 0.1% | |
| 116765 | 3 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| VERSIE | DATUM_BESTAND | PEILDATUM | JAAR | BEHANDELEND_SPECIALISME_CD | TYPERENDE_DIAGNOSE_CD | ZORGPRODUCT_CD | AANTAL_PAT_PER_ZPD | AANTAL_SUBTRAJECT_PER_ZPD | AANTAL_PAT_PER_DIAG | AANTAL_SUBTRAJECT_PER_DIAG | AANTAL_PAT_PER_SPC | AANTAL_SUBTRAJECT_PER_SPC | GEMIDDELDE_VERKOOPPRIJS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799033 | 1979 | 2012 | 103092 | 119258 | 1296750 | 1856605 | 255.0 |
| 1 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799003 | 5 | 5 | 103092 | 119258 | 1296750 | 1856605 | 2165.0 |
| 2 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799013 | 1 | 1 | 103092 | 119258 | 1296750 | 1856605 | NaN |
| 3 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799041 | 9 | 9 | 103092 | 119258 | 1296750 | 1856605 | NaN |
| 4 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799007 | 21 | 22 | 103092 | 119258 | 1296750 | 1856605 | 575.0 |
| 5 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799045 | 1 | 1 | 103092 | 119258 | 1296750 | 1856605 | NaN |
| 6 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799046 | 9 | 10 | 103092 | 119258 | 1296750 | 1856605 | NaN |
| 7 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799025 | 19 | 21 | 103092 | 119258 | 1296750 | 1856605 | 570.0 |
| 8 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799037 | 86293 | 97955 | 103092 | 119258 | 1296750 | 1856605 | 75.0 |
| 9 | 1.0 | 2020-04-24 | 2020-04-01 | 2012-01-01 | 301 | 751 | 79799012 | 31 | 31 | 103092 | 119258 | 1296750 | 1856605 | 680.0 |
Last rows
| VERSIE | DATUM_BESTAND | PEILDATUM | JAAR | BEHANDELEND_SPECIALISME_CD | TYPERENDE_DIAGNOSE_CD | ZORGPRODUCT_CD | AANTAL_PAT_PER_ZPD | AANTAL_SUBTRAJECT_PER_ZPD | AANTAL_PAT_PER_DIAG | AANTAL_SUBTRAJECT_PER_DIAG | AANTAL_PAT_PER_SPC | AANTAL_SUBTRAJECT_PER_SPC | GEMIDDELDE_VERKOOPPRIJS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 233528 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0112 | 990027198 | 894 | 1473 | 942 | 1728 | 184576 | 336251 | 220.0 |
| 233529 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027185 | 84 | 85 | 3516 | 4908 | 184576 | 336251 | 14230.0 |
| 233530 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027181 | 168 | 170 | 3516 | 4908 | 184576 | 336251 | 18095.0 |
| 233531 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027131 | 47 | 48 | 3516 | 4908 | 184576 | 336251 | 165.0 |
| 233532 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027199 | 544 | 571 | 3516 | 4908 | 184576 | 336251 | 850.0 |
| 233533 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027180 | 37 | 37 | 3516 | 4908 | 184576 | 336251 | NaN |
| 233534 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027198 | 1314 | 1574 | 3516 | 4908 | 184576 | 336251 | 220.0 |
| 233535 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027179 | 4 | 4 | 3516 | 4908 | 184576 | 336251 | NaN |
| 233536 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027186 | 2102 | 2344 | 3516 | 4908 | 184576 | 336251 | 3350.0 |
| 233537 | 1.0 | 2020-04-24 | 2020-04-01 | 2018-01-01 | 327 | 0613 | 990027182 | 73 | 75 | 3516 | 4908 | 184576 | 336251 | NaN |